Announcements¶

Quiz 2

  • Quiz 2 is next Wednesday, February 21 in discussion section.
  • Quiz 2 topics: Bitcoin/cryptocurrency material in Lectures 5-8 and NBFMG textbook chapters 2, 3, and 5.
  • Note that next Monday is a holiday, and next Wednesday is a make-up Monday. (Relatedly: there won't be any office hours next Wednesday.)

Assignments due today

  • Reading Assessment 4 is due today. It covers NBFMG Chapter 5.
  • Programming Assignments 3 and 4 are due today.

New assignments

  • There is a new Reading Assessment 5 posted today, and due next Thursday 2/22.
  • There is no new programming assignment for this week.

Lecture 8: Features of Bitcoin and Ethereum¶

Learning Outcomes¶

By the end of today's lecture, you will learn:

  • How pseudorandom generation allows Bitcoin users to generate many secret keys from a single secret.
  • How to run computer scripts in Bitcoin and another cryptocurrency called Ethereum.
  • The common knowledge property that arises from reaching agreement.

Recap from last time¶

Bitcoin was invented in 2009 by an anonymous party (or parties) holding the pseudonym Satoshi Nakamato.

Cryptocurrencies like Bitcoin have been carefully designed to ensure:

  • Censorship resistance through its use of a randomized election system (i.e., a lottery) to choose the next miner.
  • Consistency through its rule that miners and nodes should always choose the largest blockchain (or more precisely, the one with the most cumulative difficulty), with the tiebreaker being to choose the chain you heard first.
  • Double-spending protection through its proof of work puzzles that are computationally burdensome to solve. The world only accepts a block if it passes the difficulty check.

Bitcoin requires the following assumption:

The majority of the computational power used for mining (at any given time) is controlled by honest miners.

If this assumption holds, then Bitcoin achieves the following two security goals:

  1. Liveness: Every transaction is eventually finalized by all honest nodes.
  2. Safety: Honest nodes do not finalize different blocks at the same height.

There are three types of participants within Bitcoin.

  • A light client only needs to hold one or more secret keys for a digital signature scheme, plus the block headers.
  • A node additionally stores the entire state of the blockchain.
  • A miner additionally tries to write new blocks on the blockchain.

Nodes and miners can verify that transactions and blocks are valid. Light clients can verify some of this with the help of a node, using the simplified payment verification (SPV) protocol.

8.1 Pseudorandom Generation¶

Next, let's explore in detail one point that I briefly mentioned when defining light clients.

A light client only needs to hold one or more secret keys.

Why might you want to have more than one secret key in Bitcoin?

Let's consider the following scenario: suppose you are Alice, and you already have some Bitcoins. This means that you already have created secret/public keys for a digital signature scheme.

Then you have two interactions throughout the day.

  • In the morning, you visit a coffee shop and pay its store owner Bob in Bitcoin. This requires posting your public key and a digital signature to the globally-viewable blockchain.
  • In the afternoon, you talk to your friend Carol, and she decides to send you some money. To do so, she's asking you to give her a public key that she can (hash and) use as a destination address for the transaction.

What should you tell Carol?

You already have the secret/public keys from your old wallet. It's perfectly acceptable to reuse that one.

But there is a downside: now the entire world will see that the same person who went to the coffee shop in the morning is the person who is receiving coins right now. That is: the two transactions would be linkable.

A better approach is to run KeyGen again and generate a fresh secret/public key pair for this new transaction. This would provide some measure of pseudonymity.

(We will discuss the level of anonymity that Bitcoin achieves -- or fails to achieve -- in a future lecture. But still, this is better than nothing.)

Secret signing keys are quick and easy to create. But if you do this every day, quickly you will find yourself needing to remember a lot of secret keys, and that sounds cumbersome and error-prone. How can we make your life easier?

Quality of life improvement #1: generating many keys from one¶

At its core, the problem here is that Alice is generating many secret signing keys at random. They have no connection to each other, so she needs to store them all.

On the other hand, randomness was essential for the digital signature scheme to work. It is crucial that everyone chooses the secret key from a very large space, so that an attacker cannot brute force it. If everyone picked the same secret key for instance, then the signature scheme falls apart.

So here we reach a conundrum: we want a process that Alice can use to reliably and deterministically produce secret keys, but also one that "looks random" to an outsider.

Is there anything in our cryptographic toolbox that allows us to do this?

A cryptographic hash function $H$ provides this kind of property. Here's one way that we can leverage it. Suppose we have one single master key $m$ that we ask Alice to memorize. She can use it to generate many child keys pseudorandomly.

  • For her morning interaction with Bob, she can use $H(m, 0)$ to generate the first secret key. (And then Alice will generate the public key from the secret key, as usual.)
  • For her afternoon interaction with Carol, she can use $H(m, 1)$ to generate the second secret key (and then derive its corresponding public key).

This idea is effective thanks to the properties of the underlying hash function.

  • Due to preimage resistance, even if one secret key happens to be compromised and exposed to the world, nobody can use it to reverse-engineer the master key $m$.

  • Due to the random oracle property, the rows of the truth table of $H$ appear to be independent. So the two secret keys -- and their associated public keys -- have no discernible connection to each other.

We can take this idea one step further: from a single master key $m$, we can generate several keys for different cryptocurrencies like Bitcoin, Ethereum, ZCash, and more. (We will talk about alternative cryptocurrencies later today and later in the semester.) Then we can apply the previous idea to those keys in order to generate many secret/public key pairs.

This is the basic idea of a Bitcoin Improvement Protocol, or BIP, called hierarchical deterministic wallets. With HD wallets, you only have to remember a single master key $m$, from which you can (re)generate as many signing keys for as many cryptocurrencies as you want.

BIP-32 standard for hierarchical deterministic wallets

[Image source: BIP-32 specification]

Some important caveats before we continue!

  • The actual function you need to run here is not quite $H$, but rather a slightly more complicated construction called HMAC applied to the hash function.
  • And there's more work required to go from the hash digest to something that has the structure of a digital signing key.
  • And there are even more subtle details that are very important in practice, but out of scope for this lecture.

Remember: always make sure to use well-written crypto software that is developed by people who have studied these subtle details. Don't roll your own crypto software in practice!

Quality of life improvement #2: a human-memorable seed¶

We have made Alice's life somewhat easier: now all she has to do is remember a 32 byte master key. It might look something like this:

In [9]:
from os import urandom
from binascii import hexlify

master_key = urandom(32)
print("Base 16 output:", hexlify(master_key))

from base64 import b64encode
print("Base 64 output:", b64encode(master_key))
Base 16 output: b'66024b0c628148b58a03cc1ce718c468b80b98d08fc903273ecca0d76544a4e6'
Base 64 output: b'ZgJLDGKBSLWKA8wc5xjEaLgLmNCPyQMnPsyg12VEpOY='

As we have seen in previous lectures and programming assignments, these kinds of string encodings are incredibly useful for computer programs to ingest, process, and output printable characters for other computer programs to use.

But Alice is a person. This style of encoding is not meant for her.

What we need is a better way to encode 256 bits' worth of random decisions, but in a format that makes sense for people to remember or write down.

xkcd password strength comic

[Image source: xkcd.com]

The XKCD comic uses the same principle you learned about in the first discussion lab:

all information is data, and all data can be encoded into bits.

Specifically, let's think about the ways that we can store 12 bits worth of data.

  • We can write the data in binary, for example $0101 1010 0011$.
  • We can write the data as hex characters as $6a4$.
  • We can write the data in base 64 as "Wj"

Or we can choose any other representation that gives us $2^{12}$ possible values that the data can take... perhaps one that is even easier for humans to remember.

There is another Bitcoin Improvement Protocol standard made for this purpose: creating human-memorable seeds. It defines a set of $2^{11} = 2048$ words and records every 11 bits of the master key using one word.

Let's look at its set of words. I'll use English in this example, although other languages are specified in the standard too.

In [31]:
# source: https://github.com/trezor/python-mnemonic

# pip install mnemonic
from mnemonic import Mnemonic
mnemo = Mnemonic("english")
print(mnemo.wordlist[:10])  # first ten words
print(mnemo.wordlist[-10:]) # last ten words
['abandon', 'ability', 'able', 'about', 'above', 'absent', 'absorb', 'abstract', 'absurd', 'abuse']
['yard', 'year', 'yellow', 'you', 'young', 'youth', 'zebra', 'zero', 'zone', 'zoo']

We can create a master key by sampling 24 of these words at random, each of which encodes 11 bits of data. So collectively, they encode the 256 bits of entropy that we need.

In [26]:
# CAUTION: this code is just for demonstration purposes.
# Do NOT actually use this seed phrase to hold any money!

words = mnemo.generate(strength=256)
print(words)

seed = mnemo.to_seed(words, passphrase="")
print("\nBase 64 encoding:", b64encode(seed))
now creek enemy erupt jungle cigar book believe much defy provide loan tank empty force news practice night proud search top best genuine spoon

Base 64 encoding: b'7l/p3Kpq3CZIJdQNfEGvNjmPqqKnWpRuYsMHPDCNMM04VIKdsTlKp2VWr4BT6QFQi83F2HtnWNfqQ/EZXWAurw=='

8.2 Scripts on the Blockchain¶

So far, we have said that a transaction has the form "Alice pays 1 coin to Bob."

Actually, this is not quite true. Bitcoin allows for more complicated transactions than this.

A Bitcoin transaction is actually a computer script that looks like this.

In words, a Bitcoin transaction involves Alice posting her own public key and the hash of Bob's public key. This hash is called Bob's wallet "address."

In the particular transaction above, the hash of the recipient's public key is 8ec...b1.

The transaction can be thought of as a computer program that says:

"for anyone who shows up later with a public key and claims to be the recipient of this transaction, take the hash of their public key and check if it equals 8ec...b1. If so, give my coins to that person."

In fact, there are several valid operations that can be performed in a Bitcoin transaction.

The scripting language is not Turing-complete; that is, you can't do everything. But there are many kinds of transactions that you can do.

Relatedly, this is why some Bitcoin transactions take up more space than others: they can be more complicated computer programs.

bitcoin blocks

The Ethereum Cryptocurrency¶

Ethereum is another cryptocurrency. It was created in 2013 -- so, four years after Bitcoin, but still awhile ago.

Ethereum takes the idea of scripting much further than Bitcoin does.

The Ethereum blockchain is essentially a global computer where everyone agrees on the state of this computer. The global state is replicated on all nodes that are in the Ethereum network. The Ethereum state can only be changed by users issuing transactions.

[Source: Preethi Kasireddy's blog post]

Similarities. Ethereum shares several features in common with Bitcoin, such as:

  • a peer-to-peer network that connects users,
  • a consensus algorithm to agree on state updates, and
  • using cryptographic primitives like digital signatures and hashes.

Additionally, Ethereum also has its own digital currency, known as ether.

Just like Bitcoin transactions, Ethereum transactions are grouped into blocks and broadcasted into the network. Any person can use the global computer by issuing transactions and paying the required fees in its currency, which is called "Ether."

Every few seconds a leader gets elected, and that leader chooses which transactions go in the next block.

[Source: Preethi Kasireddy's blog post]

Differences. However, Ethereum's primary objective is not to function as a digital currency payment network, and Ether is not intended to be a conventional currency. Instead,

  • Ether is designed to be a utility currency that pays for the use of the Ethereum world computer.
  • Transactions on the Ethereum blockchain explain how to move from one state of the machine to the next.

[Source: Ethereum docs]

Also, Ethereum is not restricted to a limited scripting language like Bitcoin's Script. Ethereum's scripting language is Turing-complete, so it can operate as a general-purpose computer and execute code with more complexity.

You can think of Ethereum as a state machine. Ethereum's state is a large data structure which holds not only all accounts and balances, but a machine state, which can change from block to block according to a pre-defined set of rules, and which can execute arbitrary machine code.

The specific rules of changing state from block to block are defined by the Ethereum Virtual Machine (EVM). Every transaction causes the EVM to run which causes the global state to change. The order of the transaction is very important! The EVM is single threaded.

[Source Ethereum docs]

Smart Contracts¶

Why use a bunch of computers in order to create one global computer?

One benefit is that you can write software that makes and enforces financial decisions. In other words, you can create a contract that can be enforced via computer code.

For example, you could post a Sudoku puzzle to Ethereum and create a smart contract that says:

"I will pay 1 coin to the first person who posts a solution to the Sudoku puzzle in the next 5 minutes. Otherwise, I will reclaim my money."

Ethereum provides a programming language called Solidity in which to write such a smart contract. You must currently own 1 coin for this smart contract to be valid. If so, your coin is "locked" in the smart contract until it gets paid out (either to the puzzle-solver, or back to you).

Going one step further: you can even write contracts that create new coins on top of Ethereum!

Ethereum contains a standard interface for fungible tokens on the Ethereum blockchain, called ERC20. It is a set of rules that define how a token contract should behave and interact with other contracts and wallets on the Ethereum network.

For instance, you could create a new kind of coin called Sudokoin and say that anyone who solves a Sudoku puzzle earns 1 Sudokoin.

Here is open-source Solidity code to do exactly this! (Whether or not this coin has any value is a different matter...)

Warning. If you think that the idea of blending "computer code" with "irreversible financial transactions" can lead to disaster, then you would be right!

Bugs in Ethereum smart contracts can lead to your money being sent to the wrong place, or being locked forever. For this reason, smart contracts should be (and sometimes are) carefully checked by human code reviewers and formally verified by computers.

8.3 Common Knowledge¶

Remember that the most important property of the Bitcoin blockchain is that all participants eventually reach agreement over the state of its public ledger, called the blockchain.

19th century ledger

Source: Wikipedia

This is a deceptively powerful property. It means that everyone in the Bitcoin (or Ethereum) network has common knowledge.

Common knowledge is a powerful concept in logic.

Consider the following statements. Each one of them is true, but note that each statement is strictly stronger than the previous ones!

  • I (the instructor) have black hair.
  • You know that I have black hair.
  • I know that you know that I have black hair.
  • You know that I know that you know that I have black hair.
  • ...

The Muddy Children Puzzle¶

To see the power of common knowledge, consider the following problem.

  • There are $n$ children playing in the playground.
  • After playing, the teacher gathers the children in a circle and announces that "one or more of you have mud on your forehead."
  • Each child can see if others have mud on their forehead, but cannot tell whether their own forehead is muddy.

The teacher says, "at this moment, if you know you have mud on your forehead, please step forward."

The children are not allowed to communicate with each other.

Question. How will the children decide whether to respond?

After one minute passes, the teacher says again "at this moment, if you know you have mud on your forehead, please step forward."

After yet another minute passes, the teacher issues the same call once again.

Question. What happens now? Remember that the children cannot communicate with each other. Their only form of communication is to step forward if they wish to answer that yes, they do have mud on their forehead.

Reasoning about the puzzle¶

In this puzzle, the children

  • Start with the mutual knowledge that "one or more of you have mud on your forehead."
  • Develop the common knowledge over time that nobody else is confident in answering the question just yet!

Let's see what happens in each round.

In the first round:

  • If a child knows the mutual knowledge but looks around and sees nobody with mud on their forehead, they must be the only muddy child.
  • In this case, they will respond to the teacher (but nobody else will).

In the second round:

  • The children have more information! Now they know the mutual knowledge and the fact that nobody was confident to speak in the first round. As a consequence, there must exist at least two children with a muddy forehead.
  • As a result, any child who only sees one muddy forehead can be convinced that they are the second muddy child, and they will announce this fact.

Continuing via induction: it turns out that if there are $k$ muddy children, then nobody will speak for $k-1$ rounds and then al $k$ muddy children will make a (simultaneous) announcement to the teacher in round $k$.

Amazingly, the children have learned so much information from each other without saying anything at all!

In the same way, if Alice sends 1 coin to Bob on the Bitcoin blockchain, then:

  • Bob knows he received 1 coin, and
  • Bob knows that everyone else also knows that he received 1 coin, so that he is sure that the rest of the world will allow him to spend it later.

And this is true even though Bob doesn't communicate with the entire world directly.

8.4 The Path Forward¶

This brings us near the end of Part 2 of this course. In this part of the class, you have seen how to:

  • Describe the goal of decentralization and distributed consensus, in the context of financial payments via cryptocurrencies.
  • Create transactions on Bitcoin and Ethereum, determine whether a potential transaction is valid, and explain the mining algorithm and its role in incentivizing honest participation in the network.
  • Analyze public blockchains using computational and data science tools, and develop new applications on a blockchain.

(You'll see the last part in Shreyas' lecture next Tuesday.)

Importantly: agreement is possible even though the Bitcoin participants do not have a common communication channel where one person posts a message and everyone can see it.

Instead, they need to 'gossip' messages they've heard over the network. Even though some people in the network might be lying, as long as enough of the computing power in the network is honest, then everyone eventually reaches agreement over the same set of facts.

The idea that a bunch of computers over the Internet can reach agreement in the face of an adversary is kind of amazing!

It provides an elegant solution to integrity and availability problems in a variety of applications that go beyond money:

  • Sending a message to your friends and making sure that all of them hear it, and that everyone has the common knowledge that everyone else has heard it too. This is called the broadcast problem.

  • Conducting an election or any other decision-making process where it is important to make sure that everyone has consensus on which decision has been selected. This is called the Byzantine agreement problem.

  • Designing a high-assurance computer server that continues to work even if some of the individual CPUs become corrupted or faulty. This is called Byzantine fault-tolerance.

We will talk about these applications in Part 3 of the course.

Also, the Bitcoin blockchain has (at least) two major issues that we will come back to after spring break.

Issue #1: privacy. In Bitcoin, all transactions are public knowledge. Even though your name is not published on the blockchain directly, it may be possible to trace the owner of each account.

We discussed the limitations of pseudonymity in today's coffee shop example, and we will return to this point later in the semester.

Issue #2: energy inefficiency. Bitcoin's proof-of-work puzzle solves the agreement problem, but it is incredibly computationally intensive.

The U.S. Department of Energy released a report two weeks ago that estimated that "electricity use from cryptocurrency mining probably represents from 0.6% to 2.3% of U.S. electricity consumption" (source).

The good news is that there are other ways to solve the agreement problem that do not require this kind of computing power.

We will talk about both of these scientific issues, and how to resolve them, in more detail in Part 4 of this course.

Finally, we will have a broader discussion of the impact of crypto and cybersecurity technology on society, and laws and policy regulations, in the final part of the course.